ESTADISTICA PARA EL ANALISIS POLITICO II


Análisis Factorial I: Exploración


Hacemos analisis factorial para reducir las variables en otras variables resumen. Mientras la clusterización agrupaba filas, la factorización agrupa columnas. Pero, al igual que en clusterización, queremos saber si las nuevas variables tienen un nombre, al cual se le denomina técnicamente variable latente. En esta sesión exploraremos la data a ver qué emerge.

Preparación de Datos:

Para esta sesión trabajaremos con la data de estos links:

library(htmltab)

# links
happyL=c("https://en.wikipedia.org/wiki/World_Happiness_Report",
         '//*[@id="mw-content-text"]/div/table/tbody')
demoL=c("https://en.wikipedia.org/wiki/Democracy_Index", 
        '//*[@id="mw-content-text"]/div/table[2]/tbody')

# carga
happy = htmltab(doc = happyL[1],which  = happyL[2],encoding = "UTF-8")
demo  = htmltab(doc = demoL[1], which  = demoL[2], encoding = "UTF-8")

# limpieza

happy[,]=lapply(happy[,], trimws,whitespace = "[\\h\\v]") # no blanks
demo[,]=lapply(demo[,], trimws,whitespace = "[\\h\\v]") # no blanks

library(stringr) # nombres simples
names(happy)=str_split(names(happy)," ",simplify = T)[,1]
names(demo)=str_split(names(demo)," ",simplify = T)[,1]


## Formateo

# Eliminemos columnas que no usaremos esta vez:
happy$Overall=NULL
demo[,c(1,9,10)]=NULL

# También debemos tener nombres diferentes en los scores antes del merge:

names(happy)[names(happy)=="Score"]="ScoreHappy" 
names(demo)[names(demo)=="Score"]="ScoreDemo"


# Tipo de variables:

## En demo:
demo[,-c(1)]=lapply(demo[,-c(1)],as.numeric)

# En happy:
happy[,-c(1)]=lapply(happy[,-c(1)],as.numeric)

# sin perdidos:
happy=na.omit(happy)
demo=na.omit(demo)

Presta atención al merge. Usualmente hacemos merge por default y luego perdemos varias filas:

nrow(merge(happy,demo))
## [1] 147

Hagamos un nuevo merge, donde nos quedemos con TODOS los paises que no estab en uno u otro data frame:

HappyDemo=merge(happy,demo,all.x=T, all.y=T)

Esta vez HappyDemo tiene varios paises de más, pero con valores perdidos y nombres que no pudieron coincidir. Veamos:

# formateando a 
# HappyDemo[!complete.cases(HappyDemo),]

library(knitr)
library(kableExtra)
kable(HappyDemo[!complete.cases(HappyDemo),],type='html')%>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                  font_size = 10)
Country ScoreHappy GDP Social Healthy Freedom Generosity Perceptions ScoreDemo Electoral Functioning Politicalparticipation Politicalculture Civilliberties
4 Angola NA NA NA NA NA NA NA 3.62 1.75 2.86 5.56 5.00 2.94
26 Cape Verde NA NA NA NA NA NA NA 7.88 9.17 7.86 6.67 6.88 8.82
33 Congo (Brazzaville) 4.812 0.673 0.799 0.508 0.372 0.105 0.093 NA NA NA NA NA NA
34 Congo (Kinshasa) 4.418 0.094 1.125 0.357 0.269 0.212 0.053 NA NA NA NA NA NA
37 Cuba NA NA NA NA NA NA NA 3.00 1.08 3.57 3.33 4.38 2.65
40 Democratic Republic of the Congo NA NA NA NA NA NA NA 1.49 0.50 0.71 2.22 3.13 0.88
42 Djibouti NA NA NA NA NA NA NA 2.87 0.42 1.79 3.89 5.63 2.65
47 Equatorial Guinea NA NA NA NA NA NA NA 1.92 0.00 0.43 3.33 4.38 1.47
48 Eritrea NA NA NA NA NA NA NA 2.37 0.00 2.14 1.67 6.88 1.18
50 Eswatini NA NA NA NA NA NA NA 3.03 0.92 2.86 2.22 5.63 3.53
52 Fiji NA NA NA NA NA NA NA 5.85 6.58 5.36 6.11 5.63 5.59
63 Guinea-Bissau NA NA NA NA NA NA NA 1.98 1.67 0.00 2.78 3.13 2.35
64 Guyana NA NA NA NA NA NA NA 6.67 9.17 5.71 6.11 5.00 7.35
83 Kosovo 6.100 0.882 1.232 0.758 0.489 0.262 0.006 NA NA NA NA NA NA
115 North Korea NA NA NA NA NA NA NA 1.08 0.00 2.50 1.67 1.25 0.00
117 Northern Cyprus 5.718 1.263 1.252 1.042 0.417 0.191 0.162 NA NA NA NA NA NA
119 Oman NA NA NA NA NA NA NA 3.04 0.00 3.93 2.78 4.38 4.12
121 Palestine NA NA NA NA NA NA NA 4.39 3.83 2.14 7.78 4.38 3.82
122 Palestinian Territories 4.696 0.657 1.247 0.672 0.225 0.103 0.066 NA NA NA NA NA NA
124 Papua New Guinea NA NA NA NA NA NA NA 6.03 6.92 6.07 3.89 5.63 7.65
131 Republic of the Congo NA NA NA NA NA NA NA 3.31 3.17 2.50 3.89 3.75 3.24
142 Somalia 4.668 0.000 0.698 0.268 0.559 0.243 0.270 NA NA NA NA NA NA
145 South Sudan 2.853 0.306 0.575 0.295 0.010 0.202 0.091 NA NA NA NA NA NA
148 Sudan NA NA NA NA NA NA NA 2.15 0.00 1.79 2.78 5.00 1.18
149 Suriname NA NA NA NA NA NA NA 6.98 9.17 6.43 6.67 5.00 7.65
150 Swaziland 4.212 0.811 1.149 0.000 0.313 0.074 0.135 NA NA NA NA NA NA
158 Timor-Leste NA NA NA NA NA NA NA 7.19 9.08 6.79 5.56 6.88 7.65
160 Trinidad & Tobago 6.192 1.231 1.477 0.713 0.489 0.185 0.016 NA NA NA NA NA NA
161 Trinidad and Tobago NA NA NA NA NA NA NA 7.16 9.58 7.14 6.11 5.63 7.35

De lo anterior date cuenta que, por un lado, hay paises que les falta un bloque de indicadores, y que en muchos casos los nombres están mal escritos. Podemos recuperar algunos, pero en la data original:

# cambiemos a nombres usados por otra tabla:
## en demo por happy
demo[demo$Country=="Democratic Republic of the Congo",'Country']="Congo (Kinshasa)"
demo[demo$Country=="Republic of the Congo",'Country']="Congo (Brazzaville)"
demo[demo$Country=="Trinidad and Tobago",'Country']="Trinidad & Tobago"
demo[demo$Country=="North Macedonia",'Country']="Macedonia"

## en happy por demo
happy[happy$Country=="Palestinian Territories",'Country']="Palestine"

Luego de esos ajustes veamos:

HappyDemo=merge(happy,demo) # re creando HappyDemo
nrow(HappyDemo)
## [1] 150

En efecto se recuperaron 5 paises, asi quedará.

Evaluando data

El análisis factorial requiere que hagamos algunas observaciones previas.

  1. Calculemos matriz de correlación:
theData=HappyDemo[,-c(1,2,9)] # sin los Scores ni nombre de país.

# esta es:
library(polycor)
corMatrix=polycor::hetcor(theData)$correlations
  1. Explorar correlaciones:
library(ggcorrplot)

ggcorrplot(corMatrix)

ggcorrplot(corMatrix,
          p.mat = cor_pmat(corMatrix),
          insig = "blank")

Si puedes ver bloques correlacionados, hay esperanza de un buen analisis factorial.

  1. Verificar si datos permiten factorizar:
library(psych)
psych::KMO(corMatrix) 
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = corMatrix)
## Overall MSA =  0.86
## MSA for each item = 
##                    GDP                 Social                Healthy 
##                   0.84                   0.90                   0.88 
##                Freedom             Generosity            Perceptions 
##                   0.82                   0.59                   0.77 
##              Electoral            Functioning Politicalparticipation 
##                   0.80                   0.91                   0.94 
##       Politicalculture         Civilliberties 
##                   0.89                   0.85
  1. Verificar si la matriz de correlaciones es adecuada

Aqui hay dos pruebas:

cortest.bartlett(corMatrix,n=nrow(theData))$p.value>0.05
## [1] FALSE
library(matrixcalc)

is.singular.matrix(corMatrix)
## [1] FALSE
  1. Determinar en cuantos factores o variables latentes podríamos redimensionar la data:
fa.parallel(theData,fm = 'ML', fa = 'fa')

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA

Se sugieren 3, veamos:

  1. Redimensionar a numero menor de factores
library(GPArotation)
resfa <- fa(theData,nfactors = 3,cor = 'mixed',rotate = "varimax",fm="minres")
## 
## mixed.cor is deprecated, please use mixedCor.
print(resfa$loadings)
## 
## Loadings:
##                        MR1    MR3    MR2   
## GDP                     0.275  0.889  0.105
## Social                  0.326  0.730       
## Healthy                 0.346  0.829  0.133
## Freedom                 0.195  0.348  0.498
## Generosity                    -0.139  0.586
## Perceptions                    0.253  0.684
## Electoral               0.938  0.169       
## Functioning             0.752  0.446  0.334
## Politicalparticipation  0.721  0.325  0.102
## Politicalculture        0.545  0.308  0.502
## Civilliberties          0.915  0.318       
## 
##                  MR1   MR3   MR2
## SS loadings    3.443 2.744 1.477
## Proportion Var 0.313 0.249 0.134
## Cumulative Var 0.313 0.562 0.697
print(resfa$loadings,cutoff = 0.51)
## 
## Loadings:
##                        MR1    MR3    MR2   
## GDP                            0.889       
## Social                         0.730       
## Healthy                        0.829       
## Freedom                                    
## Generosity                            0.586
## Perceptions                           0.684
## Electoral               0.938              
## Functioning             0.752              
## Politicalparticipation  0.721              
## Politicalculture        0.545              
## Civilliberties          0.915              
## 
##                  MR1   MR3   MR2
## SS loadings    3.443 2.744 1.477
## Proportion Var 0.313 0.249 0.134
## Cumulative Var 0.313 0.562 0.697

Cuando logramos que cada variable se vaya a un factor, tenemos una estructura simple.

fa.diagram(resfa)

  1. Evaluando Resultado obtenido:
resfa$crms
## [1] 0.04098225
resfa$RMSEA
##      RMSEA      lower      upper confidence 
## 0.09565491 0.06006255 0.12401922 0.90000000
resfa$TLI
## [1] 0.9444927
sort(resfa$communality)
##             Generosity                Freedom            Perceptions 
##              0.3629227              0.4067030              0.5363506 
## Politicalparticipation       Politicalculture                 Social 
##              0.6351970              0.6440077              0.6489500 
##                Healthy            Functioning                    GDP 
##              0.8242328              0.8751018              0.8776003 
##              Electoral         Civilliberties 
##              0.9086463              0.9436943
sort(resfa$complexity)
##              Electoral             Generosity                    GDP 
##               1.066692               1.114800               1.219569 
##         Civilliberties            Perceptions                Healthy 
##               1.250343               1.291734               1.397214 
##                 Social Politicalparticipation            Functioning 
##               1.423394               1.435972               2.063637 
##                Freedom       Politicalculture 
##               2.134773               2.580199
  1. Posibles valores proyectados:

¿Qué nombres les darías?

as.data.frame(resfa$scores)
##              MR1          MR3         MR2
## 1   -0.448363338 -1.505210390 -1.26233138
## 2    0.435857546  0.041076923 -0.77532536
## 3   -1.012984447  0.499826649 -0.86436092
## 4    0.727110953  0.478631634 -0.88481574
## 5   -0.108905556  0.129521455 -1.08047508
## 6    1.209836251  0.743904699  1.52266196
## 7    0.868921793  0.890280754  0.61984998
## 8   -1.517245634  0.776560237 -0.44027691
## 9   -1.957444919  1.497142249  0.25276075
## 10   0.147990329 -0.729674677  0.13247415
## 11  -1.633266434  0.913766984 -0.06376098
## 12   0.795221608  0.874883702  0.54820224
## 13   0.492006837 -1.767857078  0.35658723
## 14  -0.044475529 -0.513628262  0.90105909
## 15   0.419083675 -0.235354652 -0.71386360
## 16   0.055377983  0.210544432 -1.26810322
## 17   1.150109575 -0.363216027 -0.16077381
## 18   0.899102011  0.164147146 -0.95896609
## 19   0.822482531  0.444349039 -1.28198545
## 20   0.007435514 -1.448225162  0.35246166
## 21  -1.126408899 -1.861885744  0.47659243
## 22  -1.153980334 -0.285016060  1.09156940
## 23  -0.904823690 -0.986876367 -0.05874764
## 24   1.170021390  0.767225839  1.69127178
## 25  -0.812966741 -2.566557433 -0.64467300
## 26  -1.230289989 -1.450916081 -0.35637621
## 27   1.097595618  0.369968905  0.16112343
## 28  -1.981924509  1.026919761  0.74173876
## 29   0.829310347  0.231463766 -0.60509420
## 30  -0.285485991 -1.472150862 -0.30908408
## 31  -0.907035332 -0.550827260 -0.42438306
## 32  -1.487855906 -1.341784474 -0.10137891
## 33   1.055278202  0.369249685  0.13260399
## 34   0.533964447  0.540013827 -1.07365035
## 35   0.835450512  0.753142841 -0.52147656
## 36   0.818697766  0.805003929 -0.77095245
## 37   1.023434499  0.717208346  2.06028983
## 38   0.529713990  0.202810614 -0.60589467
## 39   0.438622982  0.184827603 -0.63522772
## 40  -1.027360811  0.140042715 -0.79702881
## 41   0.612234049 -0.165745769 -1.15079920
## 42   0.891271647  0.601147787  0.19490127
## 43  -1.161624135 -0.825287830  0.79308548
## 44   1.142433873  0.718269340  1.54074059
## 45   0.814228682  0.980854581 -0.24209701
## 46  -1.057407386  0.381134990 -0.88466688
## 47  -0.317698571 -1.491959508  1.02794162
## 48   0.128275369 -0.302261454 -0.84245526
## 49   1.059970335  0.745807790  0.98428511
## 50   0.675499683 -1.215043577  0.15834910
## 51   0.955371318  0.538443322 -1.59547812
## 52   0.298729066 -0.160906278 -0.29310414
## 53  -0.770403810 -1.299951253 -0.35768600
## 54   0.421900775 -1.884678012  0.34853125
## 55   0.375992213 -0.379959579 -0.26130280
## 56  -0.191989570  1.393057194  1.11503886
## 57   0.508186625  0.542620323 -1.02964948
## 58   1.191684960  0.843910136  1.30567174
## 59   0.928298988 -0.920366429  0.17278564
## 60   0.169401302 -0.271919022  1.00517028
## 61  -1.918430002  0.704267840  0.19669459
## 62  -0.752631206  0.027771812 -1.19020040
## 63   1.091541916  0.886678147  1.44845839
## 64   0.315454449  0.865213041  0.31818437
## 65   0.826095116  0.906123167 -0.94441159
## 66  -0.332403393 -1.366101111 -0.01105133
## 67   0.913630851  0.009196477 -0.29750647
## 68   0.760457539  1.065050330  0.03608209
## 69  -0.924291519  0.325787555 -0.04775219
## 70  -1.625819305  1.092595311 -0.34753866
## 71  -0.338953280 -0.857289178  1.02952608
## 72  -1.369464604  1.470318666 -0.28748581
## 73  -0.057190086 -0.416671181 -0.33003668
## 74  -1.765262163 -0.147463519  1.04518517
## 75   0.992328923  0.397587038 -0.90903761
## 76  -0.615168503  0.452174054 -0.89173688
## 77   1.013915456 -1.740540012 -0.26654311
## 78   0.559741792 -1.987727856 -0.22473088
## 79  -1.685392689  0.640420878 -0.34140852
## 80   1.036914460  0.562279534 -1.13787622
## 81   0.997262541  1.121924242  1.13711348
## 82   0.206206140 -1.417934823 -0.35400927
## 83   0.460784483 -1.891468342  0.49034840
## 84   0.127231046  0.513200069  0.35487074
## 85   0.578732915 -1.604991834 -0.31529628
## 86   0.832783702  0.700639168  1.21556096
## 87  -0.419067857 -0.652822448 -0.72523915
## 88   1.122836735  0.158983330  0.51704529
## 89   0.254077411  0.530602989 -0.97509381
## 90   0.548300518 -0.329326546 -0.95600322
## 91   0.698979923 -0.111025762 -0.59791742
## 92   0.162140653  0.509746727 -0.79198247
## 93  -0.438651312 -0.095484654 -0.27517566
## 94  -0.542094730 -1.497357080  0.48449961
## 95  -1.090087704 -0.578804792  1.84171690
## 96   0.502764782 -0.325029762 -0.44593606
## 97  -0.048142619 -0.736893894  0.77241914
## 98   0.989789360  0.783568753  1.59741701
## 99   1.271741248  0.600193003  1.87806075
## 100 -0.955774747  0.156842049  0.06481200
## 101 -0.026427854 -1.898223536 -0.38834364
## 102 -0.153029770 -1.022286876 -0.15761003
## 103  1.151388226  0.930239431  1.93447917
## 104 -0.116901106 -0.765787620 -0.23018601
## 105 -0.599161598 -0.187792395 -0.83996073
## 106  0.697996503  0.617509032 -0.84602967
## 107  0.603588554 -0.012737087 -0.41904362
## 108  0.661819881  0.179321459 -0.99234193
## 109  0.695707569 -0.344026505 -0.39053804
## 110  0.559053306  0.704913325 -0.95019389
## 111  0.940992323  0.820458025 -0.64421204
## 112 -1.889479925  1.941034269  0.62972432
## 113  0.573369863  0.478039571 -1.24429827
## 114 -1.393160649  1.079538842 -1.41108860
## 115 -1.039580742 -1.052058551  2.17771827
## 116 -2.270615142  1.630504363 -0.37030869
## 117  0.655796018 -1.142008108  0.12234784
## 118  0.563980216  0.296919014 -0.97006374
## 119  0.297328958 -2.023599468 -0.17062056
## 120 -0.351480225  1.649827264  1.91497597
## 121  0.746184523  0.696133598 -0.95566057
## 122  0.708650402  0.840023327 -0.41732062
## 123  0.831985970 -0.335620014 -0.20120855
## 124  0.882232794  0.723735446 -0.35759723
## 125  0.880442528  0.926887655 -0.30460694
## 126  0.217172341  0.118068727  0.02804046
## 127  1.061431795  0.702282525  2.14207421
## 128  0.917615645  0.939304546  1.81019765
## 129 -1.881972055 -0.946463679  0.15327054
## 130  1.011745087  0.737206608 -0.20300808
## 131 -1.916101434 -0.252043497  0.71549347
## 132  0.088056964 -1.308498166  0.75571207
## 133 -0.617739539  0.669607203  0.25882623
## 134 -0.688909894 -1.546058514 -0.22552850
## 135  0.647817400  0.416648073 -0.42899140
## 136  0.192572939  0.040005697 -0.57629892
## 137 -1.098641190  0.866713628 -0.35353389
## 138 -2.269060367  0.875261230 -0.17607996
## 139  0.227994072 -1.444571636  0.37464732
## 140  0.164098634 -0.072143160 -0.85650947
## 141 -2.068040244  1.614321935  0.86095558
## 142  1.002311633  0.658972104  1.13401659
## 143  0.712520417  0.787703995  0.32667956
## 144  1.252890503  0.330420021  0.36019307
## 145 -2.133188960  0.419666632  1.39928036
## 146 -1.226464789  0.754504485 -1.13953813
## 147 -1.647582289  0.573472513  0.47816425
## 148 -1.620179826 -0.849559396 -0.32581892
## 149  0.345289273 -1.159620886  0.50255178
## 150 -1.065366698 -0.900876793  0.33166415
HappyDemoFA=cbind(HappyDemo[1],as.data.frame(resfa$scores))

library(plotly)


plot_ly(data=HappyDemoFA, x = ~MR1, y = ~MR2, z = ~MR3, text=~Country) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Demo'),
                     yaxis = list(title = 'Tranquilidad'),
                     zaxis = list(title = 'Bienestar')))

RECORDANDO:

library(fpc)
library(cluster)
library(dbscan)

# YA NO NECESITAS CMD para HappyDemoFA[,c(2:4)]

g.dist.cmd = daisy(HappyDemoFA[,c(2:4)], metric = 'euclidean')
kNNdistplot(g.dist.cmd, k=3)

Para tener una idea de cada quien:

resDB=fpc::dbscan(g.dist.cmd, eps=0.6, MinPts=3,method = 'dist')
HappyDemoFA$clustDB=as.factor(resDB$cluster)
aggregate(cbind(MR1, MR2,MR3) # dependientes
          ~ clustDB, # nivel
          data = HappyDemoFA,    # data
          max)            # operacion
##   clustDB        MR1       MR2       MR3
## 1       0 -0.1919896 2.1777183 1.6498273
## 2       1  1.2717412 2.1420742 1.1219242
## 3       2 -1.0986412 0.4781642 1.4703187
## 4       3 -0.6177395 0.2588262 0.6696072
## 5       4 -1.8894799 0.8609556 1.9410343
plot_ly(data=HappyDemoFA, x = ~MR1, y = ~MR2, z = ~MR3, text=~Country, color = ~clustDB) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Demo'),
                     yaxis = list(title = 'Tranquilidad'),
                     zaxis = list(title = 'Bienestar')))

Aqui acaba la Unidad II, el analisis factorial confirmatorio se verá en la siguiente Unidad.



al INICIO

VOLVER A CONTENIDOS